Improving EigenVoices-based techniques and SMLLR for Speaker Adaptation by combining EV and SMLLR techniques or using Genetic Algorithms

نویسندگان

  • Fabrice Lauri
  • Irina Illina
  • Dominique Fohr
چکیده

This paper constitutes a study of several classical and original methods for a speaker adaptation of the acoustic hidden Markov models of an automatic speech recognition system (ASRS). Most of today’s real applications require that the speaker adaptation process continuously improves the performance of the underlying ASRS, as more utterances are pronounced by a new speaker. The first part of this article is dedicated to this problem. We begin by introducing the Structural EigenVoices approach (SEV ). Compared to EigenVoices (EV ), SEV improves the performance of an ASRS with more sentences, well beyond the point where the EV system has reached its limit. We then describe four methods that combine the advantages of Structural Maximum Likelihood Linear Regression (SMLLR) and EigenVoices-based techniques (EV or SEV ). We show experimentally that one of them, SEV→SMLLR, can improve the performance of an ASRS at least as significantly as SMLLR, EV and SEV, irrespective of the amount of adaptation utterances used. The second part of our work is focused on the use of genetic algorithms for rapidly adapting acoustic models. Whereas all of the standard adaptation methods (e.g. SMLLR, SMAP, EV, etc.) are based on the E.-M. procedure and thus provide a single local optimal solution, genetic algorithms are theoretically able to provide several global optimal solutions. We experimentally show that: (1) genetic algorithms and EV both equivalently improve the performance of an ASRS, and (2) combining genetic algorithms and EV further improves the performance of an ASRS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining EigenVoices and structural MLLR for speaker adaptation

This paper considers the problem of speaker adaptation of acoustic models in speech recognition. We have investigated four different possible methods which integrate the concepts of both Structural Maximum Likelihood Linear Regression (SMLLR) and EigenVoices technique to adapt the Gaussian means of the speaker independant models for a new speaker. The experiments were evaluated using the speech...

متن کامل

Using genetic algorithms for rapid speaker adaptation

This paper proposes two new approaches to rapid speaker adaptation of acoustic models by using genetic algorithms. Whereas conventional speaker adaptation techniques yield adapted models which represent local optimum solutions, genetic algorithms are capable to provide multiple optimal solutions, thereby delivering potentially more robust adapted models. We have investigated two different strat...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Eigenvoices for speaker adaptation

We have devised a new class of fast adaptation techniques for speech recognition, based on prior knowledge of speaker variation. To obtain this prior knowledge, one applies Principal Component Analysis (PCA) [9] or a similar technique to a training set of T vectors of dimension D derived from T speaker-dependent (SD) models. This offline step yields T basis vectors, which we call “eigenvoices” ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014